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1. INTRODUCTION 

Strabismus is an eye disease in which the iris of the eye cannot align in the same position [1], which 
occurs often in childhood. Mainly, It is caused by a problem occurring in the optic-nerve, brain, or extraocular 
muscle [2]. Dangerous factors include familial inheritance [3] and early births underlying strabismus disease 
which has a severe impact on human life. In addition, it can prevent the brain from fusing the images collected 
by the two eyes, which leads to amblyopia [4]. Untreated seeing eyes can degenerate, leading to blindness [5]. 
In addition, double vision and deep insight of strabismus patients is lower than that of healthy people. 
Therefore, the prognosis and treatment of strabismus becomes increasingly important where the detection of 
strabismus is the first and one of the essential steps. The traditional approaches to detecting strabismus are 
usually done in hospitals. Doctors of patients with strabismus use the hirschberg test [6] to determine if the 
patient has strabismus: a thin beam of light is sent into the patient’s eye to check if each ocular reflex is in the 
same place on both corneas. As the number of patients with strabismus increases, the detection of strabismus 
mainly becomes annoying and prostrate with error. However, automatic detection is a useful and practical 
approach to alleviate the growing demand for detection of strabismus [7], Horner et al. [8] uses telemedicine to 
diagnose strabismus in places where specialists are not available. In such a situation, patient images were taken 
by high-resolution digital cameras and then sent to specialists for remote analysis and determination of strabismus. 
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In addition, some methods of detecting strabismus with digital tools are used. Attada et al. [9] uses 
photorefraction to achieve narrow angle strabismus detection. Loudon et al. [10] use the pediatric vision 
scanner to detect strabismus. Almeida et al. [11] applies a digital camera and the Hirschberg test to identify 
strabismus. Valente et al. [12] carry out the detection of strabismus in digital videos using coverage tests. 
Chen et al. [13] use an eye tracking system and convolutional neural networks to detect strabismus. Most of 
the previous studies use eye and eye tracking devices to capture the position of the iris or high-resolution 
images captured by digital cameras. In addition, the classification step did not lead to classifying the type of 
strabismus. In this article, deep neural network models are used to perform automatic detection of strabismus 
and specify the type of strabismus according to different sources of image acquisition (for example, low to 
high resolution of digital images). In recent years, deep neural networks have increased, as an efficient deep 
neural network training algorithm [14]-[20]. The rest of this paper as: section 2 will highlight the theoritical 
part of the proposed method. Section 3 gives the main results obtained from considering three datasets. 
Section 4 provides a conclusion of this study. 


2. METHOD 
2.1. Dataset 

Three datasets are used in this work in a total of 285 human facial images. The first dataset as shown 
in Figure 1(a) is captured by utilizing a low-resolution camera (i.e. laptop or mobiles camera) with a metal 
stand for aligning the face angle within the camera. The second dataset, as Figure 1(b), obtained from 
ophthalmologists, where all images collected from patients using mobile phones, and the third dataset 
obtained from previous studies called impa-faced dataset [21] as illustrated in Figure l(c). All datasets 
images are scaled to a fix coordination (e.g. 640x480 pixels). To acheive the automated strabismus detection, 
datasets are carefully annotated by ophthalmologists. For model learning purposes, we divided the datasets 
into two subsets: a training set contains 205 patients images (represents 72%) and a testing set contains 80 
patients images (represents 28%). 


(a) (b) (c) 


Figure 1. Sample images from datasets: (a) strabismus image from the real captured dataset, (b) strabismus 
image from the dataset collected by specialists, and (c) the normal vision from impa-faced dataset 


2.2. Proposed model 

In this work, the proposed model uses the convolutional neural network (CNN) to acquire deeply the 
features vector for automatic strabismus detection. The model is consisting of two stages: firstly, the eye 
region segmentation from the face is performed using the viola-jones algorithm [22]. Secondly, map the 
segmented eye regions into two output classes (strabismus: 1 or normal: 0) according to each eye iris 
position. The general flow diagram of the detection method illustrated in Figure 2. 
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Figure 2. The general framework of the proposed algorithm 
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2.3. Segmentation of eye region 

Applying three datasets from different sources will enhance the model training stage. Strabismus 
detection depends primarily on the discovery of eye regions from the human face. For this purpose, we applying 
the viola-jones algorithm to detect the eye region from the patient face [23]. Further, we select a pre-trained 
classifier model from a pool of classifiers to identifying eye pair location while the face is aligned in front of 
the camera using a metal stand. The algorithm detects the region of eye pair in rectangular form based on the 
coordination points (x, y, w, h), where (x, y) represent the point of rectangular eye region top-left position, 
and (h, w) stands for the rectangular height and width respectively. Note, the values of the rectangular eye 
region shape are in pixels. Deeply speaking, a detector detects the eye region object using a sliding window 
over the input image. The size of the window could be adjusted for detecting objects at various scales; 
however, the aspect ratio remains the same. The stages of the classifier are designed to remove the negative 
pieces from an image. At this point, we get a bounding box from applying haar features (shortly HF) to detect 
the eyes pair region. Consequently, it results in extracting the eye pair with the class number. Figure 3 
illustrates instances of segmented eye pair regions. 


2.4. Separation of eye region 

After obtaining the eye region as Figure 3, the next step is to separate left and right eye segments. 
For strabismus detection, both separated models would be of the same size. Practically speaking, after 
separation we resized each image to 42x22 pixels and fed it to CNNs as a training set. The target class 
number will map the position of the eye iris (e.g. 0: center, 1: left, and 2: right) as Figure 4(a) and Figure 4(b). 


(a) (b) (a) (b) 


Figure 4. Segmented instances of (a) right and (b) left eye pair regions 


2.5. Establishing convolution neural network 

After separating the eye regions, a deep learning CNN is formed to classify the eye regions. CNN is 
made up of several neurons arranged or arranged in rectangular layers [24]. The spatial arrangement of 
neurons is the fundamental property of CNNs, which allows CNNs to be used in a variety of applications. 
Moreover, sparse connectivity, sharing of settings, and pooling are the other essential properties of CNN. 
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2.5.1. Sparse connectivity 

Sparse connectivity shows that every neuron withinside the CNN is attached to a small location of 
neurons withinside the preceding layers or the following layer. It is performed with the aid of using the use of 
a kernel decrease than the enter. For example, whilst acting photograph classification, the enter photograph 
may also have a number of pixels. However, only some beneficial features, inclusive of aspect and shape, are 
detected with the aid of using kernels. Sparse connectivity can lessen the quantity of saved parameters, that’s 
of incredible significance to the performance of the network. 


2.5.2. Parameter sharing 

To improve the proposed model detection and execution time, all the neurons that sharing the same 
layer will share the same layer’s parameters. In other words, calculating one set of parameters is sufficient 
instead of computing separate parameters set for a new location. Hence, the model will have the ability to 
highlight patterns that are tilted, slightly warped, or shifted within input images. 


2.5.3. Pooling 

Pooling Indicates that rather than doing convolution, the gatherings of the neuron’s output are 
performed. Max-pooling is the recommended usage of the pooling function that aggregates the neurons and 
returns the highest value from a rectangular region. Generally speaking, the structure of CNN is usually 
composed of numerous convolutional and pooling layers, ended with one or many linked layers. Convolutional 
layers are applied to discover vital features, while the pooling layers are used to keep task-associated data and 
ignore inappropriate items [25]. Fully linked layers are mapping the excitations to the output neurons, each 
excitation is mapped to one target class. 


2.6. Network architecture and training parameters 

In this paper, two models of CNNs have been developed. The first model preserved for classifying 
the left eye images, and the second model is used for the right eye images. The architecture of both CNN 
models is composed of two convolutional layers and followed by two pooling layers. Each convolutional layer 
is followed by one pooling layer that using the rectified linear activation unit (ReLu) activation function [26]. 
ReLu is an activation function beneficial to optimize the quality of the network. Figure 5 shows three fully 
connected CNN layers. To avoid over-fitting, the dropout strategy [27] applied in network layers. It is worth 
mentioning that each convolutional layer has a batch normalization layer. It acts as a regularizer to accelerate 
network training 14 times [28]. We apply the stochastic gradient descent method [29] for training the 
network. In addition, regularization with weight decay is used for network training. The ratio of the dropout 
sets to 0:5 and the learning rate sets to 0.002. 
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Layer 


Input Layer 


Output Layer 
(Softmax) 


5x5x20 2x2 5x5x20 2x2 Three Fully-Connected 
Convolutional Max-Pooling Convolutional Max-Pooling Layer 
Layer Layer Layer Layer 


Figure 5. Architecture of CNN network 


2.7. Evaluation metrics 
To evaluate the performance of the model, we intend to apply three well-known evaluation metrics. 
These metrics are sensitivity (Se), specificity (Sp), and accuracy (Acc) as shown in: 


TP 
TP+FN 


Sensitivity(Se) = (1) 
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. . . TN 
Specificity(Sp) = TN+FP E 
Accuracy(Acc) = T i 


In these metrics, true positive (TP) representing the numbers of correctly identified strabismus images, true 
negative (TN) represents the correctly identified normal images. False positive (FP) that means the 
incorrectly identified strabismus images. The false negative (FN) is for bad identification of normal images. 
Se and Sp are responsable to give an algorithm the ability to classify normal and strabismus images from one 
hand. Acc reflects the classification performance [30] in the other hand. 


3. RESULTS AND DISCUSSION 

In this section, we conducted two experiments, after the training of CNN models, to evaluate the 
quality of network outcomes. The first experiment is to monitor the performance of CNN detection layers of 
iris position based on two classification classes. The second experiment simulates the detection efficiency of 
various classifiers when providing CNN accuracy and mean square error. It illustrates the accuracy of the 
matching results from three image classes using the CNN and the classes of images already labeled. Finally, 
we conducted a focused comparison with other studies. 


3.1. Training of CNN models 

For training the CNN models based on the training datasets for left and right eye regions, we apply 
statistical measurements: the accuracy and the mean square error (MSE). For the left eye, the accuracy of 
training achieves 100%, and the MSE is 0.0201. Table 1 shows these outcomes. For the right eye, accuracy 
achieves 100%, and mean square error equals 0:0328 as illustrated in Table 2. 


Table 1. The training stage of CNN model for left eye images 


Epoch Iteration Time elapsed Mini-batch accuracy (%) Mini-batch loss Base learning rate 


(hh: mm: ss) 

1 1 00:00:00 20.31 1.5670 0.0020 
50 50 00:00:17 98.44 0.0657 0.0020 
100 100 00:00:36 99.22 0.0316 0.0020 
150 150 00:00:57 100.00 0.0201 0.0020 


Table 2. The training stage of CNN model for right eye images 


Epoch Iteration Time elapsed Mini-batch accuracy (%) Mini-batch loss Base learning rate 


(hh: mm: ss) 

1 1 00:00:00 39.84 1.2926 0.0020 
50 50 00:00:20 96.88 0.1237 0.0020 
100 100 00:00:39 99.22 0.0596 0.0020 
150 150 00:00:59 100.00 0.0328 0.0020 


Figure 6 shows the state of accuracy with respect to the number of iterations for left and right eyes 
in the training stage. In the first epoch, the accuracy was 20.31%. In epoch 50, the accuracy raise to 96.88%. 
In epoch 100, the accuracy reached 99.22%, and in the final epoch, the accuracy achieves 100%. For the right 
eye, the final accuracy achieves 100%, and the mean square error equals 0:0328. 

Figure 7 shows the state of mean squared error in training state with respect to the number of 
iterations. For left eye, epoch one started by a mean square error of 1.5670, in epoch 50, the mean squared 
error decreased to 0.0657, in epoch 100, the mean square error was 0.0316, and in the final epoch, the mean 
squared error value recorded 0.0201. For right eye, shows the state of mean square error in training state, 
wherein epoch one, the mean square error was 1:2926, in epoch 50, the mean squared error reaches to 
0:1237, in epoch 100 the mean square error is 0:0596, and the last epoch, the mean square error achieves 
0:0328. We observed that as the training samples increases, the accuracy of both CNNs increases. In less than 
100 training patterns, the detection results improve significantly with the rise of training patterns. Applying 
over 100 training samples for training the model, we noticed that the detection results are varying 
insignificantly with the increasing amount of training patterns. From the above observations, 205 training 
patterns are selected for training CNN. The CNN architecture is illustrated in Table 3. 
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Table 3. The architecture of the CNN model 


Layer no. Layer name Type Activations Learnables 

1 Imageinput Image input 22x42x1 - 
22x42x1 images with ‘zerocenter’ normalization 

2 Conv_1 Convolution 18x38x20 Weights 5x5x1x20 
20 5x5x1 convolutions with stride [1 1] and padding [0 0 0 0] Bias 1x1x20 

3 Batchnorm_1 Batch normalization 18x38x20 Offset 1x1x20 
Batch normalization with 20 channels Scale 1x1x20 

4 ReLu_l ReLu 18x38x20 - 

5 Maxpool_1 Max pooling 9x19x20 - 
2x2 max pooling with stride [2 2] and padding [0 0 0 0] 

6 Conv_2 Convolution 5x15x20 Weights 5x5x20x2 
20 5x5x1 convolutions with stride [1 1] and padding [0 0 0 0] Bias 1x1x20 

“I Batchnorm_2 Batch normalization 5x15x20 Offset 1x1x20 
Batch normalization with 20 channels Scale 1x1x20 

8 ReLu_2 ReLu 5x15x20 - 

9 Maxpool_2 Max pooling 2x7x20 - 
2x2 max pooling with stride [2 2] and padding [0 0 0 0] 

10 Fe Fully connected 1x1x3 Weights 30280 
3 fully connected layer Bias 3x1 

11 Softmax Softmax 1x1x3 - 

12 Classoutput Classification output - - 


Crossentropyex with ‘1’ and 2 other classes 


3.2. Model testing: first experiment 

After training the CNN, to capture the iris of the eyes, we observed that the training models 
achieved the high scores of sensitivities = 0.97656 and specificity = 0.875. It indicates that the proposed 
models achieve good detection performance in the testing stage for classifying the normal and strabismus 
images. In this experiment, we will apply the test dataset on the trained CNN model to capture the position of 
the iris. This process achieved 0:95625 accuracy, which means that the CNN model can predict the 
classification class as Table 4. In this table, left eye and right eye indicate the predicted class number from 
each CNN eye model depending on iris position. We mean, each eye region (left or right eye image) is 
divided into three equal partitions (left, center, and right), each partition is mapped to a specific class number 
(1: left, 0: normal, and 2: right). In addition, target class indicates the strabismus class number (0: no strabismus, 
and 1: strabismus) based on the left and right eye iris class number. Moreover, the statistical measurements on 
the testing dataset results are Table 5. 


3.3. Model testing: second experiment 

The training models of CNN are used to classify deeply both eyes’ images into three classes (normal, 
exotropia, and esotropia) as Table 6. In this figure, the decision classes represent three types of horizontal 
strabismus problems. As mentioned earlier, the images from three datasets are labeled by ophthalmologists. 
These images are classified into three classes (1, 2, and 3) to determine the direction of the person’s eye looks. 
Class 1 indicates to the eye in front of the camera. Class 2 is related to the eye on the right side of the camera. 
Lastly, class 3 refers to the eye on the left side of the camera. The classification accuracy was 95:62% 
showing the comparison of the output and the label of the eye image. Table 6 shows the strabismus status of 
the person’s eyes from the labeled image. 
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Table 4. Types of strabismus state according to left and right iris positions 
Left eye Righteye Target class Decision 
1 0 No strabismus 
Strabismus 
Strabismus 
Strabismus 
No strabismus 
Strabismus 
Strabismus 
Strabismus 
No strabismus 


WNrRWNRF WN 


1 
1 
1 
0 
1 
1 
1 
0 


WWWNNNR 


Table 5. Performance summary of the proposed algorithm based on strabismus dataset 
Architectures TP TN FP FN Se SP Acc 
Network 12528 4 3 0.97656 0.875 _ 0.95625 


Table 6. Status of eyes strabismus types based on eyes classes 
Lefteye Right eye Decision 

1 1 Normal 
Exotropia 
Esotropia 
Esotropia 
Normal 
Binocular squint 
Exotropia 
Binocular divergent 
Normal 


WNrFWNR WH 


WWWNNNR eS 


3.4. Comparisons with other studies 

The performance of the proposed method was experimentally compared with other studies in the 
related field. The comparison results are as Table 7. In this table, the column of the study represents various 
studies with different datasets. Some of these studies have 94% for 45 pictures of the patient. Our system 
exhibited an accuracy of 95:62% for three datasets of 285 images. 


Table 7. Comparison with other studies 


Study Nb. of patient’s images Accuracy (%) 
Almeida et al. [11] 45 images 94 
Valente et al. [12] 15 videos from 15 patients 93.33 
Proposed Study 285 images 95.62 


4. CONCLUSIONS 

Strabismus has become an influential ophthalmologic disease in human life. It plays an important role 
in the prognosis and treatment of strabismus. Automated detection is an effective method to achieve the suitable 
detection of strabismus. Concretely, automated detection is applied to attain rapid strabismus detection, which 
means collecting the medical data and then sending the data to specialists for physical diagnosis and 
examination. Three data sets on strabismus are considered in this article. In addition, all the images collected 
are tagged by specialists in ophthalmology. In addition, a deep learning technique using CNN model was 
applied for automatic detection of strabismus. The study method first uses the viola-jones algorithm to 
segment the eye region, and then classifies the segmented regions as strabismus using CNN. The CNN result 
classes are considered as inputs to two experiments. The latters were proposed in the acquisition of the 
position of the iris. A first experiment uses an artificial neural network to train sequenced ocular regions in 
order to predict the existance of strabismus or not class as a function of matching the iris position in both 
eyes in one hand. In the other hand, the second experiment is for predict the type of strabismus (i.e. normal, 
exotropia, and esotropia). The obtained results from the proposed method are promising and the results from 
the experiments show highly applicable rates on the automatic detection of strabismus for medical 
applications. For future work, we need to investigate other types of strabismus such as the V shape and the 
vertical strabismus, and to investigate more clinical cases. 
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